Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language
نویسندگان
چکیده
Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions count as discourse markers, we have to reconsider how to set a framework for annotating, and, in order to better understand what we gain by introducing a discourse marker category, we have to analyse their characteristics and functions in discourse. This is especially important for languages such as Slovenian where no or little research on the topic of discourse markers has been carried out. The aims of this paper are to present a scheme for annotating discourse markers based on the analysis of a corpus of telephone conversations in the tourism domain in the Slovenian language, and to give some additional arguments based on the characteristics and functions of discourse markers that confirm their special status in conversation.
منابع مشابه
How Does Explicit and Implicit Instruction of Formal Meta-discourse Markers Affect Learners’ Oral Proficiency?
Meta-discourse markers are an inevitable part of oral proficiency which improve both the quality and comprehension of learners’ speech. While studies of oral meta-discourse have been conducted since the 1980s in a European or US context, they have remained relatively untouched in Iran. Therefore, this study aimed to seek the impact of both explicit and implicit teaching of formal meta-discourse...
متن کاملJapanese Dialogue Corpus of Multi-Level Annotation
This paper describes a Japanese dialogue corpus annotated with multi-level information built by the Japanese Discourse Research Initiative, Japanese Society for Artificial Intelligence. The annotation information consists of speech, transcription delimited by slash units, prosodic, part of speech, dialogue acts and dialogue segmentation. In the project, we used the corpus for obtaining new find...
متن کاملSpontaneous Speech Corpora for language learners of Spanish, Chinese and Japanese
This paper presents a method for designing, compiling and annotating corpora intended for language learners. In particular, we focus on spoken corpora for being used as complementary material in the classroom as well as in examinations. We describe the three corpora (Spanish, Chinese and Japanese) compiled by the Laboratorio de Lingüística Informática at the Autonomous University of Madrid (LLI...
متن کاملMetadiscourse Markers in the Abstract Sections of Persian and English Law Articles
Abstracts are well-accepted as the clarity and fidelity of language in any article assists the readership to get the central points of the research in a brief l but effective manner. Meanwhile, as a significant feature of any piece of discourse, meta-discourse markers can effectively render article abstract texts more reader-friendly and coherent. The present study aims at investigating the ext...
متن کاملMetadiscourse Markers in the Abstract Sections of Persian and English Law Articles
Abstracts are well-accepted as the clarity and fidelity of language in any article assists the readership to get the central points of the research in a brief l but effective manner. Meanwhile, as a significant feature of any piece of discourse, meta-discourse markers can effectively render article abstract texts more reader-friendly and coherent. The present study aims at investigating the ext...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Language Resources and Evaluation
دوره 41 شماره
صفحات -
تاریخ انتشار 2007